Facilitated diffusion of DNA-binding proteins 
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The diffusion-controlled limit of reaction times for site-specific DNA-binding proteins is derived 
from first principles. We follow the generally accepted concept that a protein propagates via two 
competitive modes, a three-dimensional diffusion in space and a one- dimensional sliding along the 
DNA. However, our theoretical treatment of the problem is new. The accuracy of our analytical 
model is verified by numerical simulations. The results confirm that the unspecific binding of protein 
to DNA, combined with sliding, is capable to reduce the reaction times significantly. 
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Introduction. The understanding of diffusion con- 
trolled chemical reactions has become an indispensable 
ingredient of present days technological development. 
The optimization of catalysts, fuel cells, improved bat- 
teries using electrodes with nano-structured surfaces or 
the function of semi-conductive devices are just a few 
of countless examples where diffusive processes, often in 
crowded or fractal environments, are involved to define 
the most important system parameters. For any living 
organism, diffusion plays the central role in biochemical 
and -physical reactions that keep the system alive 0, : 
The transport of molecules through cell membranes, of 
ions passing the synaptic gap or drugs on the way to their 
protein receptors are predominantly diffusive processes. 
Further more, essentially all of the biological functions 
of DNA are performed by proteins that interact with 
specific DNA sequences |3j, |4j, and these reactions are 
diffusion-controlled. 

However, it has been realized that some proteins can 
find their specific target sites on DNA much more rapidly 
than is "allowed" by the diffusion limit |]J, |fj, |fj . It is 
therefore generally accepted that some kind of facilitated 
diffusion must take place in these cases. Several mecha- 
nisms, differing in details, have been proposed for it. All 
of them essentially involve two steps. First, the protein 
binds to a random non-specific DNA site. Second, it dif- 
fuses (slides) along the DNA chain. These two steps may 
be reiterated many times before the protein actually finds 
the target, since the sliding is occasionally interrupted by 
dissociation. 

Berg et al. have provided a thorough (but somewhat 
sophisticated) theory that allows an estimation of the re- 
sulting reaction rates 0. Recently, Halford and Marko 
have presented a comprehensive review on this subject 
and proposed a remarkably simple semiquantitative ap- 
proach that explicitly contains the mean sliding length 



as a parameter of the theory [fjj . 

In the present work we suggest an alternative view on 
the problem starting from first principles. Our theory 
leads to a formula that is similar in form to that of Hal- 
ford and Marko, apart from numerical factors. In partic- 
ular, we give a new interpretation of the sliding length, 
which makes it possible to relate this quantity to exper- 
imentally accessible parameters. 

Theory. To estimate the mean time r required for a 
protein to find its target, we consider a single DNA chain 
in a large volume V. At time t = 0, the protein molecule 
is somewhere outside the DNA coil. We introduce the 
'reaction coordinate' r as the distance between the cen- 
ter of the protein and the center of the target, which is 
assumed to be presented in one copy. When r is large, 
the only transport mechanism is the 3-dimensional (3d) 
diffusion in space. On the contrary, at small r, the 1- 
dimensional (Id) diffusion along the DNA chain is more 
efficient. 

Let us define the efficiency of a transport mechanism 
in more strict terms. Let r(r — dr,r) be the mean time 
of the first arrival of the protein at the distance (r — dr) 
from the target, provided it starts from the distance r. 
In the simple cases, when the diffusion of a particle can 
be fully characterized by_ a_single coordinate, this time is 
given by the equation 
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dr = r(r — dr, r) 



Z(r) 
Dp(r) 



dr 



(1) 



where D is the diffusion coefficient, p(r) the equilibrium 
distribution function of the particle along the reaction 
coordinate (not necessary normalized), and Z(r) the local 
normalizing factor 



Z(r) 



p(r') dr' 



(2) 
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Note that the quantity 1/dr is the average frequency of 
transitions r — ► r — dr in the 'reduced' system with a 
reflecting boundary at the position r — dr (so that the 
smaller distances from the target are forbidden). The 
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quantity 



dr 



Dpi?) 
Z{r) 



(3) 



has the dimension of velocity and can be regarded as a 
measure for the efficiency of a transport process. 

For 3d-diffusion inside the volume V, we have p(r) — 
47r r 2 c, where c is the protein concentration and the factor 
47r is chosen to provide a convenient normalization for a 
system containing only one protein molecule: Z(0) = 
Vc = 1. Hence, for sufficiently small r, when Z(r) ~ 
Z(Q) = 1, the transport efficiency is 



V3d{r) = 47r£) 3d r 2 c 



(4) 



In the case of a ld-diffusion along the DNA chain we 
have p(r) — 2a, with a being the linear density of a non- 
specifically bound protein. The factor 2 accounts for the 
fact that the target can be reached from two opposite 
directions. We assume, again, that the distance r is suf- 
ficiently small, so that the DNA axis can be considered 
as a straight line. Thus, the efficiency of the ld-diffusive 
transport near the target is given by 



Via = 2D ld a 



(5) 



Our main assumption is that, during the combined dif- 
fusion process, the probability of the (non-specifically) 
bound state is close to its equilibrium value for each given 
value of r. Then the frequencies l/dr^ and 1/drid are 
additive, and so are the efficiencies of the two transport 
mechanisms given by Eqs. Q and (JjjJ. Hence, the mean 
time of the first arrival at the target of radius a can be 
found as 



dr 



l'3d + Vid 



(6) 



The main contribution to this integral is made by the 
distances close to a. For that reason, the upper limit of 
integration is set to infinity. Before evaluation of Eq. JBJ, 
we note that 



1 = Z(0) = Vc + L(T 



(7) 



where V is the volume and L is the DNA length. The 
meaning of this equation is that the system contains only 
one protein molecule. Substituting Eqs. Q and (J5J into 
Eq. ijfjj and taking into account Eq. Q, we get, finally, 



( V 



V 8,03d £ 4D 



Id 



arctan 



Here, we have introduced a new parameter 



2nD 3d ' 



(8) 



(9) 



with K = a/c being the equilibrium constant of non- 
specific binding. It is easy to verify that £ is just the 



distance, where the efficiencies of the two transport mech- 
anisms [Eqs. Q and JSJ] become equal to each other. 

Numerical model. In what follows we present nu- 
merical simulations to test the accuracy of our analytical 
result for the reaction time given by Eqs. iJSJ and 10. 
In order to approximate the real biological situation, the 
DNA was modeled by a chain of N straight segments of 
equal length Iq. Its mechanical stiffness was defined by 
the bending energy associated with each chain joint: 



Eh = kaT 



B ± at 



(10) 



where ksT is the Boltzmann factor, a the dimensionless 
stiffness parameter, and 9 the bending angle. The nu- 
merical value of a defines the persistence length, i.e. the 
"stiffness" of the chain The excluded volume effect 
was taken into account by introducing the effective DNA 
diameter, d e ff- The conformations of the chain, with the 
distances between non-adjacent segments smaller than 
d e g, were forbidden. The target of specific binding was 
assumed to lie exactly in the middle of the DNA. The 
whole chain was packed in a spherical volume (cell) of 
radius R in such a way that the target occupied the cen- 
tral position. 

In order to achieve a close packing of the chain inside 
the cell, we first generated a relaxed conformation of the 
free chain by the standard Metropolis Monte-Carlo (MC) 
method. For further compression, we defined the center- 
norm (c-norm) as the maximum distance from the target 
(the middle point) to the other parts of the chain. Then, 
the MC procedure was continued, but a MC step was 
rejected if the c-norm was exceeding 105% of the lowest 
value registered so far. The procedure was stopped when 
the desired degree of compaction was obtained. 

The protein was modeled as a random walker within 
the cell with reflecting boundaries. During one step in the 
free 3d-mode, it was displaced by the distance £3d in a 
random direction. Once the walker approached the chain 
closer than a certain capture radius r c , it was placed to 
the nearest point on the chain and its movement mode 
was changed to the ld-sliding along the chain contour. 
In this mode, the step represented a displacement by the 
distance eia performed with an equal probability in ci- 
ther direction. The ends of the chain were reflective. 
After each ld-step (and immediately after the capture) 
the walker could jump off the chain by the distance r c 
and reenter the 3d-mode. This operation was carried out 
with the kick-off probability p. 

A simulation cycle started with the walker at the pe- 
riphery of the cell and ended when the walker came 
within the distance a to the target. During all simulation 
cycles the chain conformation remained fixed. 

Below in this paper, one step is chosen as the unit 
of time and one persistence length of the DNA chain 
(50 nm) as the unit of distance. The following values of 
parameters were used. The length of one segment was 
chosen as Iq — 0.2, so that one persistence length was 
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FIG. 1: Reaction time r as a function of the sliding parameter 
£ [Eq. ©] at a fixed cell radius R — 2 and chain lengths 
L — 56, 40, 24, 8 (top to bottom). The curves are plots of 
Eq. ©. 

partitioned into 5 segments. The corresponding value of 
the stiffness parameter was a = 2.403 9]. The effective 
chain diameter was <i e ff = 0.12, the capture radius r c — 
d c g/2, and the radius of the active site was a = 0.08. 
The diffusion coefficients are defined as D^d — ejjd/S and 
Did = £ id/2- The step-size of the walker was e^d = 0.04 
and £id = £3d/ v3, yielding identical diffusion coefficients 
Aid = .Did = 8 ■ 10- 4 /3. 

The radius R of the cell and the DNA length L were 
varied in different sets of simulation. For each fixed pair 
(R,L), the kick-off probability was initially set to p = 1 
(no ld-transport, £ = 0) and subsequently reduced to 
Pi = 2~ l , i = 1, 2, . . . , 11. For each parameter set, the 
simulation cycle was repeated 2000 times. The equilib- 
rium constant K required for the calculation of the pa- 
rameter £ [Eq. ©] lias to be determined as the ratio 
Vrid/ Lr^d, where rid and T$d are the average times the 
walker spent in the bound and the free states, respec- 
tively. Note that £ depends on the choice of the proba- 
bility p, but not on cell size or chain length, since rid ~ L 
and r3d ~ V. For each choice of p, the constant K was 
determined in a special long simulation run without tar- 
get for specific binding. 

Results. In a first set of simulations, chains of various 
lengths between L = 8 and L = 56 were packed into a cell 
of radius R = 2 and volume V = 4irR 3 /3 = 32tt/3. The 
resulting averaged reaction times r are plotted in Fig. 
as a function of the variable £ [Eq. (JSJj. The curves are 
plots of Eq. ||SJ|. ft is obvious that the above relation was 
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FIG. 2: Reaction time r as a function of the sliding parameter 
£ [Eq. ©] at fixed chain length L — 56 and with varying cell 
volumes (8x, 4x, 2x and lx the original volume Vb = 327r/3, 
top to bottom). The curves are plots of Eq. ©. 

well able to reproduce the simulation results on a quan- 
titative level. This good agreement between theoretical 
and computational model indicates that the derivation 
of Eq. |(SJ), although quite simple, already contains the 
essential ingredients of the underlying transport process. 
A moderate deviation between simulation and theory is 
visible in case of L — 56 and large values of £. In the 
discussion we will shortly touch the limits of the theoret- 
ical approach if £ becomes very large. With the present 
selection of chain-parameters, the results prove that a 
ld-sliding can speed up the reaction time significantly. 
If, however, the unspecific binding becomes too strong, 
its effect turns into the opposite and the reaction time is 
increasing. The most efficient transport is achieved with 
a balanced contribution of both Id- and 3d-diffusion. 

Figure displays the results of a second set of simula- 
tions, where the longest chain of L = 56 was placed into 
cells with volumes of two, four and eight times the initial 
value Vo = 32-7T/3, leading to systems of rather sparse 
chain densities. The plots of Eq. © are again in good 
overall agreement with the simulation results, although a 
systematic deviation in case of large cell volumes, i.e. at 
low chain densities, is visible. The theoretical approach 
seems to undcr-predict the reaction time by up to 10%. 
A systematic investigation of the limits of our approach is 
part of ongoing research. For the time being we note that 
in crowded environments (of high chain density) Eq. (JHJ 
appears to be more accurate than in sparse environments. 

Discussion. Recently, Halford and Marko have pro- 
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posed a remarkably simple semiquantitative approach to 
estimate the reaction time 

6], 

yielding the expression 



-C>3d Isl -Did 

Following their argumentation, l s \ was interpreted as the 
average sliding- length of the protein on the DNA contour. 
It is instructive to note that, for £ 3> a, Eq. (JSJ turns into 

r=^ + ^, (12) 

which is of identical functional form if we identify £ with 
the sliding length of Halford and Marko. With Eq. @ 
we can now express l s \ in terms of experimentally acces- 
sible quantities, assigning a physical meaning to a previ- 
ously heuristic model parameter. Additionally, Eq. I|12(l 
contains the numerical factors which turn the initially 
semi-quantitative approach into a model of quantitative 
accuracy. 

Our results demonstrate (Fig. 2) that crowding de- 
creases the optimum sliding length: the shortest reaction 
time is reached at lower non-specific binding affinities. 
In a crowded environment the chance for the protein to 
bind or re-bind non-specifically is much higher, so that 
the period of free diffusion is shorter after each kick. In 
contrast, in sparse environments the chance to hit the 
target is increased if the protein remains in sliding mode 
over a rather long distance. Increasing the chain density 
will shift the minimum of r to lower values of £ (Fig. 2), 
while decreasing the chain length at constant volume will 
shift it to higher values (Fig. 1). The derivative of Eq. 1 121 
allows an estimate of the optimum sliding length £ opt : 

&pt = \I^Tlt d (13) 

Sliding distances have been estimated experimentally 
to up to 1000 bp for the restriction endonuclease EcoRV 
in dilute solution from the dependence of cleavage rate 
on DNA length [lCj , but from the same enzyme's proces- 
sivity a much shorter sliding length of about 50 bp was 
estimated later [l]J. The DNA concentration in the lat- 
ter work was 5 nM for a 690 bp DNA, while the highest 
chain density used here was 0.4 nM for L = 56 persistence 
lengths, corresponding to an 8230 bp DNA. For the DNA 
length and concentration used in [llj, £ opt = 0.22, or 33 
bp. We thus see that the relatively short sliding lengths 
estimated in more recent work make good sense for the 
biological function of DNA-binding proteins, since they 
constitute the best compromise between one- and three- 
dimensional search. 

The limits of our new approach are presently un- 
der investigation. In the derivation of Eq. JHJ we as- 
sumed chemical equilibrium between the free and the 
non-specifically bound states of the walker. For high 



affinity of the protein to the DNA, i.e. large values of 
£, this assumption may not be justified, since the protein 
always starts in free diffusion mode at the periphery of 
the cell. The violation of that assumption may become 
more serious if the chain density inside the cell is low, 
so that the protein has to search for a long time before 
it is able to bind to the DNA for the first time. Addi- 
tionally, in order to evaluate the efficiency of ld-diffusion 
[Eq. JSJ], it was assumed that the DNA axis could be con- 
sidered as a straight line over the distance of ld-diffusion. 
This is satisfied if the sliding length is smaller than the 
persistence length of the chain, i.e., £ < 1. 

In summary, the relation (|SJ), derived from first prin- 
ciples, provides a quantitative estimate for the reaction 
time of a protein that is moving under the control of 
two competitive transport mechanisms in a crowded en- 
vironment. Although drawing an idealized picture of the 
living cell, it will serve as the starting point for more re- 
alistic approaches, equipped with additional parameters 
that are subsequently calibrated in sophisticated simu- 
lations. The sliding parameter £ [Eq. connects the 
heuristic sliding length of Halford et al. to experimen- 
tally accessible quantities. The simulations, although so 
far performed on a limited range of system parameters, 
confirm earlier results that an unspecific binding com- 
bined with a ld-diffusion mode enables for a significant 
speed-up of the reaction. The relation (JHJ) can be used to 
extend the investigations to system sizes which are not 
easily accessible in numerical simulations such as those 
presented in this work: The size of a realistic cell nucleus 
is of the order of ten microns and it contains DNA chains 
adding up to a length of the order of meters. 

We thank J. F. Marko for fruitful discussions. 
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